353 research outputs found
Havens: Explicit Reliable Memory Regions for HPC Applications
Supporting error resilience in future exascale-class supercomputing systems
is a critical challenge. Due to transistor scaling trends and increasing memory
density, scientific simulations are expected to experience more interruptions
caused by transient errors in the system memory. Existing hardware-based
detection and recovery techniques will be inadequate to manage the presence of
high memory fault rates.
In this paper we propose a partial memory protection scheme based on
region-based memory management. We define the concept of regions called havens
that provide fault protection for program objects. We provide reliability for
the regions through a software-based parity protection mechanism. Our approach
enables critical program objects to be placed in these havens. The fault
coverage provided by our approach is application agnostic, unlike
algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16),
September 2016, Waltham, MA, US
Weak Secrecy in the Multi-Way Untrusted Relay Channel with Compute-and-Forward
We investigate the problem of secure communications in a Gaussian multi-way
relay channel applying the compute-and-forward scheme using nested lattice
codes. All nodes employ half-duplex operation and can exchange confidential
messages only via an untrusted relay. The relay is assumed to be honest but
curious, i.e., an eavesdropper that conforms to the system rules and applies
the intended relaying scheme. We start with the general case of the
single-input multiple-output (SIMO) L-user multi-way relay channel and provide
an achievable secrecy rate region under a weak secrecy criterion. We show that
the securely achievable sum rate is equivalent to the difference between the
computation rate and the multiple access channel (MAC) capacity. Particularly,
we show that all nodes must encode their messages such that the common
computation rate tuple falls outside the MAC capacity region of the relay. We
provide results for the single-input single-output (SISO) and the
multiple-input single-input (MISO) L-user multi-way relay channel as well as
the two-way relay channel. We discuss these results and show the dependency
between channel realization and achievable secrecy rate. We further compare our
result to available results in the literature for different schemes and show
that the proposed scheme operates close to the compute-and-forward rate without
secrecy.Comment: submitted to JSAC Special Issue on Fundamental Approaches to Network
Coding in Wireless Communication System
Shrink or Substitute: Handling Process Failures in HPC Systems using In-situ Recovery
Efficient utilization of today's high-performance computing (HPC) systems
with complex hardware and software components requires that the HPC
applications are designed to tolerate process failures at runtime. With low
mean time to failure (MTTF) of current and future HPC systems, long running
simulations on these systems require capabilities for gracefully handling
process failures by the applications themselves. In this paper, we explore the
use of fault tolerance extensions to Message Passing Interface (MPI) called
user-level failure mitigation (ULFM) for handling process failures without the
need to discard the progress made by the application. We explore two
alternative recovery strategies, which use ULFM along with application-driven
in-memory checkpointing. In the first case, the application is recovered with
only the surviving processes, and in the second case, spares are used to
replace the failed processes, such that the original configuration of the
application is restored. Our experimental results demonstrate that graceful
degradation is a viable alternative for recovery in environments where spares
may not be available.Comment: 26th Euromicro International Conference on Parallel, Distributed and
network-based Processing (PDP 2018
Epidemic failure detection and consensus for extreme parallelism
Future extreme-scale high-performance computing systems will be required
to work under frequent component failures. The MPI Forum’s User
Level Failure Mitigation proposal has introduced an operation,
MPI Comm shrink, to synchronize the alive processes on the list of failed
processes, so that applications can continue to execute even in the presence
of failures by adopting algorithm-based fault tolerance techniques. This
MPI Comm shrink operation requires a failure detection and consensus
algorithm. This paper presents three novel failure detection and consensus
algorithms using Gossiping. Stochastic pinging is used to quickly detect
failures during the execution of the algorithm, failures are then disseminated
to all the fault-free processes in the system and consensus on the
failures is detected using the three consensus techniques. The proposed
algorithms were implemented and tested using the Extreme-scale Simulator.
The results show that the stochastic pinging detects all the failures in
the system. In all the algorithms, the number of Gossip cycles to achieve
global consensus scales logarithmically with system size. The second algorithm
also shows better scalability in terms of memory and network
bandwidth usage and a perfect synchronization in achieving global consensus.
The third approach is a three-phase distributed failure detection
and consensus algorithm and provides consistency guarantees even in very
large and extreme-scale systems while at the same time being memory and
bandwidth efficient
Unternehmensnachfolge im sächsischen Handwerk
Die Unternehmen des sächsischen Handwerks sehen sich vielseitigen Herausforderungen gegenüber. Das Verteidigen der Wettbewerbsposition gegenüber anderen Handwerkern aus dem Inland und dem osteuropäischen Ausland gestaltet sich schwierig. Zudem stehen viele Betriebe in den kommenden Jahren vor einem Generationswechsel. Das Thema Generationswechsel erhält vor dem Hintergrund der sich abzeichnenden demographischen Entwicklung noch zusätzlich Brisanz. Zur Problematik von Unternehmensnachfolgen in Sachsen erstellte die Niederlassung Dresden des ifo Instituts für Wirtschaftsforschung eine Studie im Auftrag des Sächsischen Staatsministeriums für Wirtschaft und Arbeit. Der vorliegende Beitrag fasst die wesentlichen Ergebnisse speziell zu den sächsischen Handwerksunternehmen, die auch im sächsischen Mittelstandsbericht 2005/2006 veröffentlicht wurden, zusammen.Unternehmensnachfolge; Handwerk; Sachsen
Context-Aware Technology Mapping in Genetic Design Automation
Genetic design automation (GDA) tools hold promise to
speed-up
circuit design in synthetic biology. Their widespread adoption is
hampered by their limited predictive power, resulting in frequent
deviations between the in silico and in vivo performance of a genetic
circuit. Context effects, i.e., the change in overall circuit functioning,
due to the intracellular environment of the host and due to cross-talk
among circuits components are believed to be a major source for the
aforementioned deviations. Incorporating these effects in computational
models of GDA tools is challenging but is expected to boost their
predictive power and hence their deployment. Using fine-grained thermodynamic
models of promoter activity, we show in this work how to account for
two major components of cellular context effects: (i) crosstalk due
to limited specificity of used regulators and (ii) titration of circuit
regulators to off-target binding sites on the host genome. We show
how we can compensate the incurred increase in computational complexity
through dedicated branch-and-bound techniques during the technology
mapping process. Using the synthesis of several combinational logic
circuits based on Cello’s device library as a case study, we
analyze the effect of different intensities and distributions of crosstalk
on circuit performance and on the usability of a given device library
- …